Algorithm selection of off-policy reinforcement learning algorithm

نویسندگان

Romain Laroche

Raphaël Féraud

چکیده

This paper formalises the problem of online algorithm selection in the context of Reinforcement Learning. The setup is as follows: given an episodic task and a finite number of off-policy RL algorithms, a meta-algorithm has to decide which RL algorithm is in control during the next episode so as to maximize the expected return. The article presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS). Its principle is to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection. Under some assumptions, a thorough theoretical analysis demonstrates its near-optimality considering the structural sampling budget limitations. ESBAS is first empirically validated on a dialogue task under 32 algorithm configurations. Another experiment on a fruit collection task shows that ESBAS can successfully be adapted to a true online setting where algorithms update their policies after each transition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...

متن کامل

Algorithm selection of reinforcement learning algorithms

Dialogue systems rely on a careful reinforcement learning (RL) design: the learning algorithm and its state space representation. In lack of more rigorous knowledge, the designer resorts to its practical experience to choose the best option. In order to automate and to improve the performance of the aforementioned process, this article tackles the problem of online RL algorithm selection. A met...

متن کامل

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1701.08810 شماره

صفحات -

تاریخ انتشار 2017

Algorithm selection of off-policy reinforcement learning algorithm

نویسندگان

چکیده

منابع مشابه

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

Algorithm selection of reinforcement learning algorithms

Reinforcement Learning Algorithm Selection

Reinforcement Learning Algorithm Selection

Reinforcement Learning Algorithm Selection

عنوان ژورنال:

اشتراک گذاری